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Abstract 

This article is an extended version of a paper presented in the WSOM’2012 conference [ij. We 
display a combination of factorial projections, SOM algorithm and graph techniques applied to 
a text mining problem. The corpus contains 8 medieval manuscripts ■which were used to teach 
arithmetic techniques to merchants. 

Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analy¬ 
sis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the 
deviation from the independence between words and manuscripts. Still, we also want to discover 
and characterize the common vocabulary among the whole corpus. 

Using the properties of stochastic Kohonen maps, which define neighborhood between inputs in 
a non-deterministic way, we highlight the words which seem to play a special role in the vocabulary. 
We call them fickle and use them to improve both Kohonen map robustness and significance of 
FCA visualization. Finally we use graph algorithmic to exploit this hckleness for classification of 
words. 


Introduction 

Historical Context 

One approach to understand the evolution of science is the study of the evolution of the lan¬ 
guage used in a given field. That is why we would like to pay attention to the vernacular texts 
dealing with practical arithmetic and written for the instruction of merchants. Such texts are 
known since the XIlU^ century, and from that century onwards, the vernacular language appears 
more and more as the medium of practical mathematics. 

Treaties on arithmetical education were therefore mostly thought and written in local languages, 
(they were written not only in French but also in Italian, Spanish, English and German). In this 
process, the XV**' century appears as a time of exceptional importance because we can study the 
inheritance of two hundred years of practice. For the authors of these texts, the purpose was not 
only to teach merchants, but also to develop knowledge in vernacular language. Their books were 
circulated far beyond the shopkeepers’ world, to the humanists’ circles for example. 


An objective of historical research: the study of specialized languages 

The work previously done by historians [2| consisted in the elaboration of a dictionary of the 
lexical forms found in all the treaties, in order to identify the different features of the mathematical 
vernacular language at that time. This being done, we have worked on the contexts of some 
especially important words in order to understand the lexicon in all its complexity. In other words, 
we would like to determine the common language that forms the specialized language beyond the 
specificity of each text. 
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Manuscripts and Title 

Date 

Author 

Number of 

occurrences 

Number 
of words 

Hapax 

Bib. nat. Fr. 1339 

ca.l460 

anonyme 

32077 

2335 

1229 

Bib. nat. Fr. 2050 

ca.l460 

anonyme 

39204 

1391 

544 

Cesena Bib. Mai. S-XX VI- 
6, Traicte de la praticque 

1471? 

Mathieu 

Prehoude? 

70023 

1540 

635 

Bibl. nat. Fr. 1346, 

Commercial appendix of 
Triparty en la science des 
nombres 

1484 

Nicolas 

Chuquet 

60814 

2256 

948 

Med. Nantes 456 

ca. 1480-90 

anonyme 

50649 

2252 

998 

Bib. nat. Arsenal 2904, 
Kadran aux marchans 

1485 

Jean 

Certain 

33238 

1680 

714 

Bib. St. Genv. 3143 

1471 

Jean 

Adam 

16986 

1686 

895 

Bib. nat. Fr. Nv. Acq. 
10259 

ca.l500 

anonyme 

25407 

1597 

730 


Table 1: Corpus of texts and main lexicometric features. The number of occurences is the total number of words 
including repetitions, the number of words is the number of distinct words, Hapax are words appearing once in a 
text. 


Outline of this work 

Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analy¬ 
sis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the 
deviation from the independence between words and manuscripts. Still, we also want to discover 
and characterize the common vocabulary among the whole corpus. That is why we introduce a 
new tool, which combine the properties of Factorial Correspondence Analysis and Stochastic Self- 
Organizing Maps. That leads to the definition of fickle pairs and fickle words. Fickle words can be 
seen as this common vocabulary we are looking for, and prove themselves to be a good basis for a 
new visualization with the help of graph theory. 

In part [1] we first focus on the definition of the corpus: the texts, the pre-processing, and the 
protocol which is traditionally used in Humanities and Social Sciences to handle such data. Then, 
(part [5]), we design the tools : ’fickle pairs’ and ’fickle words’, robust Kohonen Maps, improved 
FCA, graphs of relations between words based on fickleness. We explain the algorithms involved 
and display the results on the corpus. Finally (part[3|), we give a brief analysis and comments on 
these results. 

1. Text, Corpus and protocol 

In order to delimit a coherent corpus among the whole European production of practical cal¬ 
culation education books, we have chosen to pay attention to those treaties which are sometimes 
qualified as commercial (marchand in French) which have been written in French between 1415 
and about 1500. Note that this corpus has already been studied by i, i and . In this way, our 
corpus follows the rules of the discourse analysis: homogeneity, contrastiveness and diachronism. 
For further explanation about texts, methodology and purpose of the analysis see Hi, for further 
explanation about the corpus for wider explication about analysis see Q; [Sl¬ 
it contains eight treaties on the same topic, written in the same language and by different XV**' 
century authors. The following Table [U describes some elements of the lexicometric characteristics 
of the corpus and shows how non balanced it is. 

1.1. Humanities and Social Sciences traditional protocol 

Traditionally on this kind of textual data, researchers in Humanities and Social Sciences work 
on statistical specificity and contextual concordances, since they allow an easy discovery of the 
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major lexical splits within the texts of the corpus, while remaining close to the meanings of the 
different forms. 

Then, the factorial and clustering methods, combined with co-occurrences analysis (see i) help 
us to cluster the texts without breaking the links with semantic analysis. 

However, such a method of data processing requires a preliminary treatment of the corpus, the 
lemmatization M- It consists in gathering the different inflected forms of a given word as a single 
item. It allows us to work at many different levels of meaning, depending upon the granularity 
adopted: forms, lemma, syntax. 

We can justify this methodological choice here by its effect on the dispersion of the various 
forms which can be linked to the same lemma, a high degree of dispersion making the comparison 
between texts more difficult. It must also be remembered that in the case of medieval texts, this 
dispersion is increased by the lack of orthographic norms. In our case, this process has an important 
quantitative consequence on the number of forms in the corpus, which declines from 13516 forms 
to 9463, a reduction of some 30%. 


This process has been achieved with a particular attention to the meaning of each word in order 
to suppress ambiguities: a good example is the French word pouvoir which can be a verb translated 
by "can" or "may", and which is also a substantive translated by "power". 

Finally, to realize a clustering of the manuscripts, we have only kept the 219 words with highest 
frequencies. The set of words selected that way for text classification relates to mathematical 
aspects, such as operations, numbers and their manipulations, as well as to didactic aspects. Their 
higher frequencies reflect the fact that they are the language of the mathematics as they appear 
to be practiced in these particular texts. 

Thus, in what follows, the data are displayed in a contingency table T with I = 219 rows (the 
words) and J = 8 columns (the manuscripts) so that the entry tij is the number of occurrences of 
word i in manuscript j. 


1.2. Use of Factorial Correspondence Analysis (FCA) 

Factorial Correspondence Analysis is one of the factorial methods which consist in applying an 
orthogonal transformation to the data, to supply the user with simplified representation of high¬ 
dimensional data, as defined in [HI- The most popular of these factorial methods is the Principal 
Component Analysis, which deals with real-valued variables and supplies for example the best 
two-dimensional representation of high-dimensional dataset, by retaining the first two eigenvectors 
of the covariance matrix. 

Factorial Correspondence Analysis (see 0 or [l3| ) is a variant of Principal Component Anal¬ 
ysis, designed to deal with categorical variables. Let us consider two categorical variables with 
respectively / and J items and the associated contingency table T where entry tij is the number 
of co-occurrences of item i for the row variable and item j for column variable. The rows and the 
columns are scaled to sum to 1 and normalized in order to be treated simultaneously, by defining 


±norm 




Si ^i,j Sj U,j 


( 1 ) 


To achieve the FCA, two Principal Component Analysis are made over the normalized table 
rpnorm (providing a representation of the rows) and its transposed table (providing a representation 
of the columns). The main property of FCA is that both representations can be superposed, since 
their principal axes are strongly correlated. The proximity between items is signihcant, regardless 
they stand for row items or column items, except in the center of the map. 

In our case, the rows are the words (/ = 219) and the columns are the manuscripts (J = 8). 
Figures [5] and 13] show the projection of the data on the first four factorial axes. 

The first two factors (43.94% of the total variance) show the diversity of the cultural heritages 
which have built the language of these treaties. The first factor (25.03%) discriminates between 
the university legacy on the right, and the tradition of mathematical problems on the left. 
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Figure 2: Projection on the first two factors of the FCA. The eight texts appear in frames, a few words are displayed 
while the remaining are simply figured by dots, for the sake of readability. 


On the left, we can observe a group whose strong homogeneity comes from its orientation 
towards mathematical problems {trouver that is to say "to find", demander which we can translate 
as "to ask") and their iteration {item, idem). That vocabulary can be found most often in both the 
appendix of Triparty en la science des nombres (Nicolas Chuquet) and Le Traicte de la praticque. 
Furthermore, there are more verbal forms on this side of the axis than on the other. And we can 
find verbs like requerir which means "to require", convenir "to agree", faire "to do". Some of them 
are prescriptive, as devoir "to have to" or vouloir "to want" for example, while others introduce 
examples, as montrer "to show". All these texts contain a lot of mathematical problems and in a 
way are practical. On the right, the texts of BnF. fr. 1339 and Med. Nantes 456 are clearly more 
representative of the university culture, containing latin words sequences. 

The second axis (17.91% of the variance) is mostly characterized by the manuscript of BNF. fr. 
2050 and also by Kadran aux marchans. It displays words of Italo-Provencal origin, like nombrateur 
which refers to the division’s numerator. Designations of the fraction and operation of division 
take a significant part of the information while the most contributory words (for ex. figurer "to 
draw") allow us to examine another dimension of these works: the graphical representation as a 
continuation of writing. 

The following factors 3, 4 and further show the Lexicon that seems to be more related to the 
singularity of some manuscripts. The importance of Nicolas Chuquet inertia of factors 3 and 4 
singles out this book on the plane (see Figure [H]) in relation to the rest of the corpus. 
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Figure 3: Projection on third and fourth factors of the FCA. 


With the manuscript of Nantes 456, at left, factor 3 highlights a vocabulary of some technical 
accuracy, hr any case rare in the rest of the corpus, like quotiens "quotient", anteriorer "to put 
before". At right, there is a very diversified vocabulary, associated to mairuscrit 10259, which is 
a well organized compilation of a copy of Kadrans aus marchans and of a lot of problems whose 
origin has not beeir fully ideirtihed. 

Correspondence Analysis displays the particularities of each text, but leaves uirtouched some 
more complex elements of the data. For instance, we have to see the third factor to uirderstand 
that the Triparty en la science des nombres (Nicolas Chuquet) and the Traicte de la praticque 
use different university mathematical cultures. These two treaties are not only copying university 
algorithms as they were taught at university at that time, they have their own originality. 

Moreover, we cannot assert that the words which appear in the center of the graph represent a 
’common vocabulary’: as a matter of fact, we should analyze all the successive factors in order to 
build the list of words constituting the ’common vocabulary’. It is a very cumbersome task. 


1.3. Kohonen Maps 

SOM-based algorithms were very often used for text mining purpose. Oja and Kaski’s seminal 
book 0 provides a lot of examples on this field. A major tool for that purpose is WEBSOM 
method and softwar43, as defined for instance in (l^ . 11^ . 171. Other important papers (among 


^See http: //websom.hut.f i/websom/ 
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Figure 4: Building of the extended, symmetrized table in the KORRESP algorithm 


hundreds) are [HE [ 23 . 

Most of them look for classification and clustering using keywords, put in evidence the main 
features, associate documents with their most characteristic words, to define proximity in order to 
define clusters and hierarchies between documents. Techniques such as WEBSOM are especially 
designed to deal with massive documents collections. 

Our purpose is very different, since we have very few documents and since we look for a subset of 
words which are not "specific" of some manuscript, but contrarily belong to a common vocabulary. 


Factorial Correspondence Analysis (FCA) suffers from some limitations as explained in sec- 
tio ril.2l To overcome this, we use a variant of the SOM algorithm which deals with the same kind 
of data, i. e. a contingency table. This variant of SOM was defined in 21| or [23 and we refer to 
it as KORRESP algorithm. Let us recall this definition. 

Tha data are displayed as explained in fourth paragraph of section [TT2] in a contingency / = 219 
by J = 8 table. The data are normalized applying equation ([T]), exactly in the same way as for 
Factorial Correspondence Analysis. The normalized contingency table is denoted by where: 




Ei ^i,j 


We consider a Kohonen map, and associate to each unit u a code-vector C„ with (J -f I) 
components. The first J components evolve in the space of the rows (the words), while the last I 
components belong to the space of the columns (the manuscripts). 

Let us denote 

C„ = (Cj,C/)„ = (Cy„,C/,„), (2) 


to put in evidence the structure of the code-vector Cu 


We use the SOM algorithm as a double learning process, by alternatively drawing a row 

(a word) and a column (a manuscript). 

When we draw a row i, we associate the column j{i) that maximizes the coefficient so: 

j{i) = argmaxt^y"* = argmax — (3) 

that maximizes the conditional probability of j given i. We then create an extended {J + I) - 
dimensional row vector X = {i,j{i)) = {Xj,Xj). See Figure [T31 
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Subsequently, we look for the closest of all the code vectors, in terms of the Euclidean distance 
restricted to the first J components. Note uq the winning unit. Next we move the code-vector 
of the unit ug and its neighbors towards the extended vector X = (i,j(i)), as per the customary 
Kohonen law. Let us write down the formal definition: 


uo = argmin ||Xj - Cy^ll (4) 

U 

Cr™ = Cr + ea(u,uo)(^-C'r) (5) 

where e is the adaptation parameter (positive, decreasing with time), and a is the neighborhood 
function, such that a{u, ug) = 1 if u and ug are neighbour in the Kohonen network, and cr{u, ug) = 0 
if not. 

The reason to associate a row and a column in such a way is to keep the row-column associations 
which are realized in classical EGA by the fact that the principal axes of both Principal Component 
Analysis are strongly correlated. 

The procedure is the same when we draw a column j with dimension / (a column of 7 ^"°'’"*). 
We associate the row i{j) that maximizes the coefficient so: 

i{j) = arg max = argmax— (6) 

* ’ * 

that maximizes the conditional probability of i given j. We then create an extended (J -I- /)- 
dimensional column vector Y = (i(j), j) = {Yj^Yi). 

We then seek the code-vector that is the closest, in terms of the Euclidean distance restricted 
to the last I components. Let vg be the winning unit. Next we move the code-vector of the unit vg 
and its neighbors towards the extended vector Y = {i{j),j), as per the customary Kohonen law. 
Let us write down the formal definition: 

Uo = argmin ||L7 - C/,„|| (7) 

V 

= C-J‘i + ,a{v,Vg)iY-C:‘'^) (8) 

where e and a are defined as before. 

This two-steps computation carries out a Kohonen classification of the rows (the words), 
together with a classification of the columns, maintaining all the while the associations of both 
rows and columns. 

We can sum up the definition of the KORRESP algorithm 

• normalization of the rows and of the columns in the way as in EGA computation, 

• definition of an extended data table by associating to each row the most probable column 
and to each column the most probable row, 

• simultaneous classification of the rows and of the columns onto a Kohonen map, by using the 
rows of the extended data table as input for the SOM algorithm. 

After convergence of the training step, the items of the rows and of the columns are simulta¬ 
neously classified. In our example, one can see proximity between words, between texts, between 
words and texts. It is the same goal as in Factorial Gorrespondence Analysis. The advantage is 
that it is not necessary to examine several projection planes: the whole information can be read 
on the Kohonen Map. 

We display below (Figure E]) the SOM map which simultaneously represents the words and the 
texts. For this map as for all the remaining of this paper we use the online algorithm, a lOx 10 grid 
and the following simple neighborhood function: 1 for the eight (fewer if we are along one edge of 
the map) nodes adjacent to the selected one and 0 for the others. 
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One can observe that the interpretation (see Figure [S]) is very similar to the interpretation that 
could be done from the Factorial Correspondence Analysis projections. But, as an example of the 
robustness problem, we can compare two different Kohonen maps (in Figures [5] and [7]) and the 
respective positions of the words raison "reason" and dire "to say", very far from each other in 
the first map while neighboring in the second one. 
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Figure 5; Example of Kohonen Map. Manuscripts are in bold. Notice that raison (9,7) and dire (3,7) are far apart 
from each other. 


The Kohonen algorithm is stochastic, and it can happen that several runs get different results, 
and that these differences can be troublesome. Hence the idea to introduce repetitions of the runs 
to separate stable and robust results from purely stochastic behavior. In the following, we study 
the variability of the maps which provides new information. 
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Figure 6: Interpretation of the Kohonen map (Figure [b}, the diagonal opposes university and practical vocabularies 


2. Getting extra information through the extraction of fickle words 

In its classical presentation ( 2 ^, [2l|, the SOM algorithm is an iterative algorithm, which takes 
as input a dataset Xi,i € and computes code-vectors m„, u G {1 ,..., C/} which define 

the map. 

We know that self-organization is reached at the end of the algorithm, which implies that close 
data in the input space have to belong to the same class or to neighboring classes, that is to say 
that they are projected on the same prototypes or on neighboring prototypes on the map. In what 
follows, we call neighbors data that belong either to the same unit or to two adjacent units. But 
the reciprocal is not exact: for a given run of the algorithm, two given data can be neighbors on the 
map, while they are not in the input space. That drawback comes from the fact that there is no 
perfect fit between a two-dimensional map and the data space (except when the intrinsic dimension 
is exactly 2). As we just notice, since the SOM algorithm is a stochastic one, the resulting maps 
can be different from one run to another. How to overcome this difficulty? 

In fact, we can use this drawback to improve the interpretation and the analysis of relations 
between the studied words. Our hypothesis is that the repetitive use of this method can help us 
to identify words that are strongly attracted/repulsed and also fickle pairs. 
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Figure 7: Another example of Kohonen Map. This time, raison (4,5) and dire (3,5) are neighbors. 


2.1. Neighborhood and robustness of information on Kohonen maps 

We address the issue of computing a reliability level for the neighboring (or no-neighboring) 
relations in a SOM map. More precisely, if we consider several runs of the SOM algorithm, for a 
given size of the map and for a given data set, we observe that most of pairs are almost always 
neighbors or always not neighbors. But there are also pairs whose associations look random. These 
pairs are called fickle pairs. This question was addressed by M in a bootstrap frame. 

According to their paper, we can define: NEIGHj ^ = 0 if Xi and Xj are not neighbors in the 
1-th run of the algorithm, and NEIGHl j = 1 if and Xj are neighbors in the Tth run of the 
algorithm, where {xi,Xj) is a given pair of data, I is the number of the observed runs of the SOM 
algorithm. 

Then Yij = NEIGHI j is the number of times when the data Xi and Xj are neighbor for 
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L different, independent runs. The stability index M.i^j is defined as the average of NEIGHij 
over all the runs {I = 1,L), i. e. 


■Mi,j — 


EliNEIGHl, Y, 


I2L 
L ' 


(9) 


The next step is to compare it to the value it would have if the data Xi and Xj were neighbors 
by chance in a completely random way. 


So we can use a classical statistical test to check the significance of the stability index A4ij. 
Let U be the number of units on the map. If edge effects are not taken into account, the number of 
units involved in a neighborhood region (as defined here) is 9 in a two-dimensional map. So for a 
fixed pair of data Xi and Xj, the probability of being neighbors in a random way is equal to 9/17 (it 
is the probability for Xj to be a neighbor of Xi by chance once the class Xi belongs to is determined). 

As Yij = NEIGHI j is the number of times when the data Xi and Xj are neighbor for L 
different, independent runs, it is easy to see that Yij is distributed as a Binomial distribution with 
parameters L and 9/t/. 

Using the classical approximation of Binomial Distribution by a Gaussian one {L is large and 
9/17 not too small), we can build the critical region of the test of null hypothesis i7o "xi and Xj 
are neighbors by chance" against hypothesis i7i: " the fact that Xi and xj are neighbors or not is 
significant". 

We conclude that the critical region for a test level of 5% based on Yij, is 

] - 00 ,- ^)[ y ]L^ + - |:),-hoo[ (10) 


For the frequency (i.e. the stability index) Aiij = YijiL, the critical region is 


]-oo,^-1.96^^(l-^)[ 
To simplify the notations, , let us put 

u 

9 / 9 9 

l£7 + WcI<'-t7>-+”l 

( 11 ) 

9 

A = — and B = 

1.96^ 

/-(I--). 

U’ 

( 12 ) 


Then, practically, for each pair of words (xi,Xj), we compute the index = YijjL, and 

apply the following rule: 


• if their index is greater than A + B, they are almost always neighbors in a significant way, 
the words attract each other. 

• if their index is comprised between A — B and A + B, their proximity is due to randomness, 
they are a fickle pair. 

• if their index is less than A — B, they are almost never neighbor, the words repulse each 
other. 

2.2. Identification of fickle pairs 

We run KORRESP L times and store the result in a matrix Ai of size {N +p) x (N +p). The 
value stored in a given cell i,j is the proportion of maps where i and j are neighbors. 

Table [5] displays an example of the first nine rows and columns of such a matrix. We have 
highlighted with colors three different situations. According to the theoretical study mentioned 
above: 
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Table 8; Frequency of neighborhood matrix (excerpt) 
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Table 9: Frequency of neighborhood matrix (same excerpt as[8l with row and columns reorganized) 


• Black cells stand for pairs that are neighbors with high probability (proximity happens with 
frequency greater than A + here 0.1787). 

• White cells stand for pairs that are not neighbors with high probability (proximity happens 
with frequency less than A — B, here 0.0014). 

• Grey cells are not conclusive, they are the fickle pairs. 

If we rearrange the order of cells and columns through Berlin permutations, we immediately 
make remarkable clustering properties appear (see Table [51) 

For each word, through this treatment we get a list of words that can roughly be grouped 
around two poles: the strongly associated and the almost never associated ones. Between these 
two extremes lies a central yet difficult to characterize. 

This technique could be used for classification, but here our main objective is a bit different: 
we are mostly interested in a characterization of words that have high mobility in Kohonen maps, 
that we call fickle words. 

2.3. From fickle pairs to fickle words 

We call fickle a word which belongs to a huge number of fickle pairs: 

Unfortunately, it is not quite an easy task to find an appropriate threshold 0. Here we have 
decided to fix it according to data interpretation. The 30 ficklest words, whose number of safe 
neighbors/non-neighbors (non-fickle pairs) is between 89 and 119, are displayed in Figure [151 
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contraire "opposite" (89) 

regie de trois "rule of three" (104) 

depenser "to expend" (112) 

doubler "to double" (89) 

savoir "to know" (105) 

racine "root" (113) 

falloir "to need" (93) 

partie "to divide" (105) 

chose "thing" (113) 

meme "same, identical" (93) 

position "position" (107) 

compter "to count" (113) 

pratique "practical" (94) 

exemple "for example" (107) 

dire "to say" (113) 

seulement "only" (94) 

demi "half" (108) 

nombrer "count" (115) 

double "double" (97) 

garden "to keep"(109) 

raison "calculation, problem" (116) 

multiplication (99) 

science "science" (109) 

donner "to give" (117) 

reduire "to reduce" (103) 

pouvoir "can" (111) 

ensemble "together" (117) 

regie "rule" (103) 

se "if" (111) 

valoir "to be worth" (119) 


Figure 10: 30 ficklest words among 219 studied. For each word, the number between brackets stands for how many 
non-fickle pairs it belongs to. 


2.^. Graph of robust neighborhood 

Let us have a different look at the neighborhood matrix (fij) where fij is the frequency of two 
words belonging to the same neighborhood. Instead of trying to jump right ahead and identify 
fickle words in an absolute way, we can study the robust connections between words per se, in 
order to produce some interesting clustering of the words. 

For example, if we have a look at the excerpt from Table [HI we notice immediately that some 
groups of words are very often in the same neighborhood, while their connections to the rest of the 
graph are much more hazardous. This initial intuition becomes quite obvious if we reorganize the 
rows and columns (following Berlin’s permutation matrices idea), as we can see on Table [H 

We cannot display here the whole matrix for the 219 forms - in addition, the algorithm for 
reorganization would not be efficient enough - so we have decided to focus on a specific group of 
words: the fickle words. Indeed, the fickle words are the most difficult to study, since by definition 
they do not have a very fixed position on the Kohonen maps, and additionally it appears that they 
are not well distinguished by Factorial Correspondence Analysis either. 

Table HD shows the frequency matrix for the 30 ficklest words. The clustering is not obvious a 
priori, so we can use a different representation for better visualization of the underlying structures. 
We can fix the threshold A+B as defined in equation (HI and consider this matrix as the adjacency 
matrix of a graph G{V, E) such that: 


• the set of vertices V is identified to the fickle words 

• the set of edges E is defined by (i,j) G E ^ fij > A + B 

In other terms, G is the graph of highly probable neighborhood relations in Kohonen maps. 

In the case of fickle words, the graph G is given by Figure dH 

2.5. Quasi-cliques 

Graphs are powerful tools for visualization, since the graphical representation can be built 
according to some parameters that ensure highly connected set of vertices to be gathered as much 
as possible. Still, it can be interesting not to rely only on graphical intuition, but also to use some 
clustering algorithms with performance guarantee. 

Since our graph is pretty dense, it appears that the concept we need here is a quasi-clique 
coloring. For an introduction to quasi-clique and clique partition problems, one can for example 
refer to [^ . A quasi-clique is a subgraph of highest density; typically, if h is a nondecreasing 
function, AT C K is a quasi-clique according to h if |A[Ar]| > h{\K\, |K|). Note that we use the 
following notations, which are classical in graph theory: AW C K is a subset of vertices, then 
G[W] is the subgraph induced by IF, and E\W\ is set of edges which are internal to G[IF]. Here 
we choose h : |Ar|,|F| i-^- liLKlATl — l)/2 — 1. In other terms, we define a quasi-clique as a subgraph 
such that every pair of vertices except at most one is connected. 
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Table 11: Frequency of neighborhood matrix for the ficklest words only = adjacency matrix of the neighborhood 
graph of the ficklest 


Unfortunately, finding a maximum quasi-clique is NP-hard in the general case as well as with 
this specific function. Still since the gra ph is small and has bounded degree, we can afford to use 
moderately exponential algorithms (see |2^). We build a partition of the graph as such: 


Algorithm Glutton Quasi-Clique Decomposition(G) 

1 : if k = {H C V, |iJ| > 4 V U[iJ] < \H\{\H\ - l)/2 - 1)} ^ 0 then 
2 : Find K which is maximum among k 

3: Return (A, GLUTTON QUASI - CLIQUE DECOMPOSITION(G[U \ K])) 

4: else 

5: Return V 


This formal definition can be rephrased with a simple explanation: the algorithm will look for 
the maximum size quasi-clique, add it as an item of the partition and proceed recursively until the 
graph contains no quasi-clique of size 4 or more. The remaining vertices are left isolated in the 
decomposition (note that a quasi-clique of size 3 is simply a path, and thus not very interesting to 
study). 

Of course the difficult point is the computation of k at each step. There are basically two 
solutions, depending on the density of the graph. 

If the graph G{V,E) has high average degree 5g = 2|i?|/|U|, then its complementary graph 
G(y, \ E) has low average degree Sq = |U| — 1 — Sq, and thus the algorithm from is efficient. 

On the other hand, if the graph has low average degree, we can use the following algorithm, 
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Figure 12: Graph of the relations between fickle words. Two nodes are connected if the words are significantly 
neighbors. 


that basically solves the quasi-clique problem through the resolution of a small (quadratic) number 
of clique problems: 


Algorithm QUASI CLIQUE(U, i?) 

1: K = CLIQUE(U, E) 

2: for all u,v € V{u,v) ^ E do 

3: A = max{A,CLIQUE(U,£;u (u,u))} 

4: return K 


Here CLIQUE can be any exact algorithm for the maximum clique problem, which is NP-hard 
too. To our knowledge, the fastest ones are those designed in [ 23 . 


3. Analysis of results 

3.1. Robust Kohonen maps 

In what follows, we consider the Kohonen map after removing the fickle words (in gray), see 
Figure [131 We call this modified map a Robust Kohonen map. 

The Robust Kohonen map shows a contrast between the top right corner and the bottom left 
one, the same contrast as between left and right sides on the first axis in EGA representation (see 
Figure HI). 

Indeed, the top right corner contains words linked to arithmetic practice and verbs that are used 
to build arithmetical operations as retenir"to retain", emprunter"to borrow", etc. No manuscript 
is specific of this part of the map, even if the BNF fr. 2050 and Kadran aux Marchans appear in 
the low part of the graph. 

On the bottom left corner, one finds the lexical inheritance of the medieval university represented 
by the manuscript BNF fr. 1339. Some texts contain a highly specialized vocabulary, with 
connections to the university world, with words such as article "article", algorism "algorithm". 
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sain "integer", or a vocabulary of geometry used for the extraction of roots (carree). 

The other two corners of the map are characterized by vocabulary taken from two specihc 
manuscripts. The BSG 3143 in the up left corner is a treatise written by Jean Adam for future 
Louis XI, that is exceptional in the corpus because it uses Latin words and roman numbers and 
also because it had to be pleasant for the prince. In spite of this, it shares with the Nantes 456 
and BNF fr.l339 words as gecter, gectons that are marks of abacus algorithms. 

In the opposite corner, the Traicte de la pratique is marked by a more descriptive vocabulary 
of mathematical problems [item "item", demande "demand", requerir "to call for", quant "quant") 
and stronger scientific approach (aliquot "aliquot", corps "held", proportionnellement "proportionally".) 

What can we do with the list of "fickle words" from this map ? First, it is remarkable that 
a part of fickle words concerns the algorithm of the rule of three. This algorithm consists of a 
"multiplication" (multiplier) by the "opposite" (contraire) and of a "division" (diviser). Other 
fickle words are related to the operations (reduction "fractions reducing", multiplier "to multiply", 
additionner "to add"), and with words having a distinctive didactic flavor (falloir "have to", dire 
"to say"). As a matter of fact, the two main technical issues for these XV*** century authors are to 
teach how to use the rule of three and the fractions to their readers. 

3.2. Improved visualization for FCA 

The combination of both techniques FCA and SOM whose result is displayed in Figures [H] 
and [m is interesting because it preserves properties of the FCA while giving additional information 
about the center of the projection - which is usually difficult to interpret. Indeed, the identification 
of the fickle words on the FCA projections allows us to improve the general interpretation of the 
factorial graphs, where some words are located because of the algorithm and not because of their 
attraction to other words and to the texts. 

Remember that, on the first two factorial axes (see Figure [14]), we have observed an opposition 
between the university legacy, on the right, and a more practical pole with rules, problems and 
fractions, on the left. It could be tempting to support this observation with very significant words 
such as pratique "practical" or regie de trois "rule of three". Still, the enhancement of the fickle 
forms on the FCA shows that these words are in fact shared between many different texts and not 
only linked to the more ’practical’ ones: Nicolas Chuquet and Traicte en la praticque. As a matter 
of fact, they do belong to all the texts. 

It is the same for two other words (raison "reason", dire "to say"). The word raison is an 
ambiguous word, in a way, because it can mean calculation, with textual matches like "do your 
reasons", or indicate mathematical problems. "To say" ranks sixth by order of frequency among 
verbs in the corpus. Note that all most important verbs are not fickle words. The hrst eight verbs for 
occurrence are: etre(14523) "to be", avoir(Ahhd) "to have", devoir(3826) "must", /a*re(3431) "to 
do", multiplier(8228) "to multiply", dire(2461) "to say", partir(2648) "to divide", "valoir"(2606) 

"to be worth". One of the particular meanings of "to say" comes from the orality of this type of 
text. Understanding arithmetic operations often supposes saying it aloud. 

Another interest of this kind of representation is the interpretation of the center of FCA. As 
we can see in Figure [121 fickle words are close to the center, but there are other words in the same 
place, which we could interpret. 

To conclude, we can see that two levels of interpretation are superimposed: the fickle pairs 
reveal the shared lexicon and the factorial map inserts them in a local interaction system. And 
since the fickle words list is computed independently from the FCA, we can successively study 
these interactions on each axis. It is the articulation between these two levels which makes this 
representation interesting. At the end, the meaning of this new kind of factorial map is quite 
intuitive and offers easy tools to the argumentation. 
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Figure 13: Robust Kohonen map: fickle words are removed (in gray) 


3.3. Neighborhood graphs 

Berlin matrix (see Table ini) shows some remarkable clustering among fickle words. The 
question is now to produce some interpretation of this clustering. 

First, we can observe that the Berlin matrix displays four groups along its diagonal, from the 
more connected on the top left, to the less connected on the bottom right. 

The first list {contraire, depenser, falloir, racine, meme, demi, savoir, see Figure [10] for a 
translation) is a collection of rather heterogeneous words. There are words frequently used such as 
demi and others bearing a strong polysemy such as falloir. 

A possible explanation may be that these groups of words form phrases in the corpus (that is 
usually called textual co-occurence): the words are spaced only one or two words from each other. 
and that these associations are reflected in the table. In that case, fickle words can be used as a 
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Figure 14: Projection on first two factors of the FCA. Only the fickle words are in black. 
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Figure 15; Projection on third and fourth factors of the FCA. Only the fickle words are in black. 



Figure 16; Correlation between fickleness and distance to the center, x-axis represents the number of fickle pairs a 
word belongs to, while y-axis stands for the square distance to the origin. 


tool to extract topoi. Thus, for example, savoir "to know" and contraire "contrary" are often used 
in phrases such as savoir par son contraire "to know smth through its contrary". 

We can also think that these clustering properties reveal more distant co-occurrences, that 
means words appearing in the same sentence or paragraph, but not necessarily the same topos. For 
instance, falloir "to have to do" has a lot of such co-occurrences with fickle words like reduire "to 
reduce", racine "root" and savoir "to know". In this configuration, we can in fact suppose that 
all these words are shared by the same sentences. 
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Table 17; Same as Table [TTI reorganized and with shades proportional to value. 


On the contrary, demi, that is a part of the same well-connected group (according to the 
clustering), does not have any specific co-occurrences in the texts with any word in this group. 
That is especially interesting, since it reveals the existence of connections that could not have been 
deduced from a simple study of co-occurrence with classical tools. 

The Figure [18] opens another perspective. Indeed, it shows the words that make the link 
between clusters. These fickle words have a lot of different affinities. We can see, for example, that 
the positions of reduire "to reduce" and exemple "example" are not very surprising, because these 
words are used a lot, in every text, in sentences associating them with various other fickle words, 
such as "in all the examples preceding the problems", or "the problem of reducing or converting 
the monetary values". 

These questions are not solved yet, and the answer cannot be sure without an enlargement of 
the corpus. Indeed, we would like to test this hypothesis by using the process described here on a 
larger part of the corpus. 


Conclusion 

In this work, we have shown how to use the Kohonen maps as a complement of Factorial 
Correspondence Analysis methods (FCA)classically used in lexicometry, 

• to improve the information provided by the different projections of the FCA, 

• to make the Kohonen maps more robust with respect to the randomness of the SOM algorithm, 
by distinguishing stable neighbor pairs from fickle pairs, 

• to build graphs of connections between fickle words which are difficult to analyze by both 
FCA and Kohonen map alone. 
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Figure 18: Glutton decomposition in quasi-cliques of maximum size. 


We think that it will be interestin g to use this methodology on a large variety of corpus, such 
as political speeches, chivalric culture [2g texts and scientific articles. 
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